Text mining: identification of similarity of text documents using hybrid similarity model

نویسندگان

چکیده

The volume of data that are accessible on the internet has increased dramatically. This growth will only increase exponentially in future as more exhaust devices connected to network. A part these consists documents from various sources. As digital sources increase, it becomes tough perform process identification relevant information which is most essentially needed for their further usage. goal this research present a hybrid similarity algorithm uses text summarization techniques identify papers similar terms both semantic and contextual similarity. Some methods aim quantify corpus’s polysemy quotient using deep learning with numerous layers prebuilt Natural Language Processing (NPL) models determine document In comparison other conventional algorithms, experimental results our model showed an accuracy 76.25%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text-to-text Similarity of Sentences Text-to-text Similarity of Sentences

Assessing the semantic similarity between two texts is a central task in many applications, including summarization, intelligent tutoring systems, and software testing. Similarity of texts is typically explored at the level of word, sentence, paragraph, and document. The similarity can be defined quantitatively (e.g. in the form of a normalized value between 0 and 1) and qualitatively in the fo...

متن کامل

A Fuzzy Similarity Based Concept Mining Model for Text Classification

Text Classification is a challenging and a red hot field in the current scenario and has great importance in text categorization applications. A lot of research work has been done in this field but there is a need to categorize a collection of text documents into mutually exclusive categories by extracting the concepts or features using supervised learning paradigm and different classification ...

متن کامل

Lexical Acquisition for Clinical Text Mining Using Distributional Similarity

We describe experiments into the use of distributional similarity for acquiring lexical information from clinical free text, in particular notes typed by primary care physicians (general practitioners). We also present a novel approach to lexical acquisition from ‘sensitive’ text, which does not require the text to be manually anonymised – a very expensive process – and therefore allows much la...

متن کامل

Text Reuse Detection using a Composition of Text Similarity Measures

Detecting text reuse is a fundamental requirement for a variety of tasks and applications, ranging from journalistic text reuse to plagiarism detection. Text reuse is traditionally detected by computing similarity between a source text and a possibly reused text. However, existing text similarity measures exhibit a major limitation: They compute similarity only on features which can be derived ...

متن کامل

Efficient Hybrid Semantic Text Similarity using Wordnet and a Corpus

Text similarity plays an important role in natural language processing tasks such as answering questions and summarizing text. At present, state-of-the-art text similarity algorithms rely on inefficient word pairings and/or knowledge derived from large corpora such as Wikipedia. This article evaluates previous word similarity measures on benchmark datasets and then uses a hybrid word similarity...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Iran Journal of Computer Science

سال: 2022

ISSN: ['2520-8438', '2520-8446']

DOI: https://doi.org/10.1007/s42044-022-00127-4